Skip to content

[CI] Run hipSPARSELt when hipBLASLt subtree changes#7514

Open
tony-davis wants to merge 7 commits into
developfrom
users/todavis/hipsparse-test-dependency
Open

[CI] Run hipSPARSELt when hipBLASLt subtree changes#7514
tony-davis wants to merge 7 commits into
developfrom
users/todavis/hipsparse-test-dependency

Conversation

@tony-davis
Copy link
Copy Markdown
Contributor

@tony-davis tony-davis commented May 15, 2026

Tracking: ROCM-25320[CI] hipSPARSELt is not tested when hipBLASLt changes (under-tested downstream)

Why this is still open after #7219: #7219 drives test deps from TheRock's CMake TEST_SUBPROJECTS, which is the right long-term shape, but as written it adds hipsparselt to projects_to_test for the blas row without activating the sparselt optional matrix project — so the build does not contain hipSPARSELt artifacts and the tests do not actually run. See discussion thread and ROCM-25320 for full context. This PR remains the immediate fix; once #7219 grows an optional-matrix-project activation hook, SUBTREE_EXTRA_MATRIX_PROJECTS can be removed.

Fixes #7519

What this PR does

  • Adds SUBTREE_EXTRA_MATRIX_PROJECTS in .github/scripts/therock_matrix.py: when the changed subtree list includes projects/hipblaslt, the matrix also includes sparselt, so existing additional_options["sparselt"] merges -DTHEROCK_ENABLE_SPARSE=ON and hipsparselt into the same blas TheRock job as today's hipSPARSELt-only PRs.
  • Updates .github/scripts/tests/therock_matrix_test.py: importlib.reload between tests (module-level project_map is mutated), asserts hipsparselt for hipblaslt-only subtrees, and aligns the miopen+rocwmma expectation with one combined matrix row.

Background: #7519. Same dependency-tree theme as TheRock #3491 (over-testing / unrelated blocking) vs. this change (under-testing a real downstream).

Risk analysis

Area Assessment
Application code None; CI selection + tests only.
CI / developer workflow Medium: hipBLASLt-only PRs gain sparselt cost (sparse + hipSPARSELt tests) in the blas job; watch runtime and flakes after merge.
Selector regressions Mitigated by unit tests (hipblaslt-only subtrees → hipsparselt in projects_to_test).

This PR's presubmit TheRock job still expands to all subtrees (smoke) because it touches .github/scripts/therock* (therock_configure_ci.py behavior). That is not the matrix shape of a normal hipBLASLt code PR; narrow behavior is covered by pytest and by post-merge spot-checks on hipBLASLt-only diffs.

Test plan

  • cd .github/scripts && python -m pytest tests/therock_matrix_test.py -v
  • After merge: one hipBLASLt-only PR (no .github/**/therock*) — confirm TheRock blas matrix includes hipsparselt
  • After merge: brief watch on hipBLASLt PR CI time / new failures

tony-davis and others added 2 commits May 15, 2026 12:51
Map projects/hipblaslt to also activate the sparselt optional stack so hipsparselt builds and tests with the blas TheRock job. Reload therock_matrix in unit tests to avoid cross-test project_map mutation; align miopen+rocwmma test with a single combined matrix entry.

Co-authored-by: Cursor <cursoragent@cursor.com>
Adds a no-op CMake comment so the PR diff includes projects/hipblaslt (non-.md paths are not CI-skippable). Drop this commit before merge if you want a CI-only diff.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions github-actions Bot added the project: none Does not target any component label May 15, 2026
@tony-davis tony-davis marked this pull request as ready for review May 15, 2026 19:15
@tony-davis tony-davis requested a review from a team as a code owner May 15, 2026 19:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an automatic CI dependency so hipBLASLt-only PR changes also trigger hipSPARSELt testing in TheRock matrix, addressing under-testing of a real downstream consumer.

Changes:

  • Introduce SUBTREE_EXTRA_MATRIX_PROJECTS mapping and apply it in collect_projects_to_run to inject extra optional matrix projects.
  • Reload module between tests to reset mutated module-level state, and add/adjust assertions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
.github/scripts/therock_matrix.py Adds extra-matrix mapping and a loop that augments the project set with optional projects (e.g. sparselt for hipblaslt).
.github/scripts/tests/therock_matrix_test.py Adds setUp reload, asserts hipsparselt is included for hipblaslt-only subtrees, and updates miopen+rocwmma expectation to a single combined entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/scripts/therock_matrix.py Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (69.24%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7514      +/-   ##
===========================================
- Coverage    66.36%   59.75%   -6.61%     
===========================================
  Files         1606     1730     +124     
  Lines       267985   320862   +52877     
  Branches     37395    48397   +11002     
===========================================
+ Hits        177825   191705   +13880     
- Misses       75219   113077   +37858     
- Partials     14941    16080    +1139     
Flag Coverage Δ *Carryforward flag
TensileLite 26.23% <ø> (?) Carriedforward from ea46a2d
hipBLAS 90.65% <ø> (ø) Carriedforward from ea46a2d
hipBLASLt 41.27% <ø> (+0.09%) ⬆️
hipCUB 82.21% <ø> (ø) Carriedforward from ea46a2d
hipDNN 85.46% <ø> (ø) Carriedforward from ea46a2d
hipFFT 50.47% <ø> (ø) Carriedforward from ea46a2d
hipSOLVER 69.24% <ø> (ø) Carriedforward from ea46a2d
hipSPARSE 85.37% <ø> (ø) Carriedforward from ea46a2d
rocBLAS 48.11% <ø> (ø) Carriedforward from ea46a2d
rocFFT 49.80% <ø> (ø) Carriedforward from ea46a2d
rocRAND 57.02% <ø> (ø) Carriedforward from ea46a2d
rocSPARSE 72.47% <ø> (ø) Carriedforward from ea46a2d

*This pull request uses carry forward flags. Click here to find out more.
see 128 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

eidenyoshida pushed a commit that referenced this pull request May 15, 2026
…hift, AssertKRingShiftTailWrapOnly, BAddrInterleave (#7513)

## Summary

Downstream **hipSPARSELt** / **TensileCreateLibrary** builds broke after
**[#7443](#7443 (*Manual
revert KRingShift*) because **TensileLite** stopped understanding
several **Assert\*** / **KRingShift** fields that were **still present**
in checked-in **Tensile logic YAML**. This PR **removes those stale
keywords** from the affected **hipBLASLt** and **hipSPARSELt** logic
files so parsing matches the reverted **Python** behavior.

## Root cause (what broke)

- **[#7443](#7443 removed
handlers in `projects/hipblaslt/tensilelite/Tensile/Contractions.py` for
keys such as **`AssertFree1DivByMT1LowbitGT1`**,
**`AssertKRingShiftTailWrapOnly`**, and related **KRingShift** /
**BAddrInterleave** plumbing.
- Shipped **YAML** under
**`projects/hipsparselt/.../Tensile/Logic/...`** (e.g. **gfx950** /
**gfx942**) and **`projects/hipblaslt/.../Tensile/Logic/...`** (e.g.
**gfx1250** GridBased) still contained those fields.
- **TensileCreateLibrary** then hit **`RuntimeError: Unknown assertion
key: AssertFree1DivByMT1LowbitGT1`** (and would be vulnerable to the
next unknown key if only partially cleaned).

So the regression was **YAML ↔ parser skew** after the revert, not the
drop Docker image or venv.

## What this PR does

- **hipSPARSELt:** strips **`AssertFree1DivByMT1LowbitGT1`**,
**`KRingShift`**, **`AssertKRingShiftTailWrapOnly`**,
**`BAddrInterleave`** from the **Equality** logic YAMLs under
**`aquavanjaram/gfx942`** and **`gfx950`** (the paths exercised by
**gfx942;gfx950** builds).
- **hipBLASLt:** same cleanup on the **8** **`gfx1250` GridBased** logic
YAMLs that still carried those keys.

No change to kernel math here—this aligns **on-disk logic** with the
**post-#7443** **TensileLite** parser.

## How we know it’s fixed

- **Math CI:** **[hipblaslt-drop-build
#117](https://math-ci.amd.com/job/image-builder/job/hipblaslt-drop-build/117/)**
completed **SUCCESS** with a **`ROCM_LIBS_REF`** that includes this fix,
exercising **hipBLASLt** + **hipSPARSELt** with **gfx942;gfx950** (full
drop-style lane).
- **This PR’s presubmit:** touches `projects/hipsparselt/**` (and
**hipBLASLt** logic), so **TheRock** runs the **hipSPARSELt** /
**sparselt** workstream for this change—not only **blas**—which directly
exercises **TensileCreateLibrary** on the updated YAML.
- **PR tag:** includes **`[project:hipsparselt]`** so **hipSPARSELt** is
**built and tested** on this PR alongside **hipBLASLt**, not only after
merge.

## CI coverage context (why #7443 slipped)

- **[#7519](#7519 —
documents that **hipBLASLt-only** PRs previously did **not** reliably
pull **hipSPARSELt** into the **TheRock** matrix, so **#7443** could
merge green while still breaking downstream **hipSPARSELt** / drop
builds.
- **[#7514](#7514 —
follow-up to wire **`projects/hipblaslt/**`** so **`sparselt`** runs
when **hipBLASLt** changes, closing that gap long-term (orthogonal to
this YAML fix, but related).

## Checklist

- [ ] Confirm no remaining **`AssertFree1DivByMT1LowbitGT1`** / stray
**KRingShift** keys in the touched trees (`git grep` on the branch).
- [ ] After merge, spot-check **TheRock** + optional
**hipblaslt-drop-build** on **`develop`**.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
assistant-librarian Bot pushed a commit to ROCm/hipBLASLt that referenced this pull request May 15, 2026
[hipblaslt][hipsparselt] Removed
 AssertFree1DivByMT1LowbitGT1, KRingShift, AssertKRingShiftTailWrapOnly,
 BAddrInterleave (#7513)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

Downstream **hipSPARSELt** / **TensileCreateLibrary** builds broke after
**[#7443](ROCm/rocm-libraries#7443 (*Manual
revert KRingShift*) because **TensileLite** stopped understanding
several **Assert\*** / **KRingShift** fields that were **still present**
in checked-in **Tensile logic YAML**. This PR **removes those stale
keywords** from the affected **hipBLASLt** and **hipSPARSELt** logic
files so parsing matches the reverted **Python** behavior.

## Root cause (what broke)

- **[#7443](ROCm/rocm-libraries#7443 removed
handlers in `projects/hipblaslt/tensilelite/Tensile/Contractions.py` for
keys such as **`AssertFree1DivByMT1LowbitGT1`**,
**`AssertKRingShiftTailWrapOnly`**, and related **KRingShift** /
**BAddrInterleave** plumbing.
- Shipped **YAML** under
**`projects/hipsparselt/.../Tensile/Logic/...`** (e.g. **gfx950** /
**gfx942**) and **`projects/hipblaslt/.../Tensile/Logic/...`** (e.g.
**gfx1250** GridBased) still contained those fields.
- **TensileCreateLibrary** then hit **`RuntimeError: Unknown assertion
key: AssertFree1DivByMT1LowbitGT1`** (and would be vulnerable to the
next unknown key if only partially cleaned).

So the regression was **YAML ↔ parser skew** after the revert, not the
drop Docker image or venv.

## What this PR does

- **hipSPARSELt:** strips **`AssertFree1DivByMT1LowbitGT1`**,
**`KRingShift`**, **`AssertKRingShiftTailWrapOnly`**,
**`BAddrInterleave`** from the **Equality** logic YAMLs under
**`aquavanjaram/gfx942`** and **`gfx950`** (the paths exercised by
**gfx942;gfx950** builds).
- **hipBLASLt:** same cleanup on the **8** **`gfx1250` GridBased** logic
YAMLs that still carried those keys.

No change to kernel math here—this aligns **on-disk logic** with the
**post-#7443** **TensileLite** parser.

## How we know it’s fixed

- **Math CI:** **[hipblaslt-drop-build
#117](https://math-ci.amd.com/job/image-builder/job/hipblaslt-drop-build/117/)**
completed **SUCCESS** with a **`ROCM_LIBS_REF`** that includes this fix,
exercising **hipBLASLt** + **hipSPARSELt** with **gfx942;gfx950** (full
drop-style lane).
- **This PR’s presubmit:** touches `projects/hipsparselt/**` (and
**hipBLASLt** logic), so **TheRock** runs the **hipSPARSELt** /
**sparselt** workstream for this change—not only **blas**—which directly
exercises **TensileCreateLibrary** on the updated YAML.
- **PR tag:** includes **`[project:hipsparselt]`** so **hipSPARSELt** is
**built and tested** on this PR alongside **hipBLASLt**, not only after
merge.

## CI coverage context (why #7443 slipped)

- **[#7519](ROCm/rocm-libraries#7519 —
documents that **hipBLASLt-only** PRs previously did **not** reliably
pull **hipSPARSELt** into the **TheRock** matrix, so **#7443** could
merge green while still breaking downstream **hipSPARSELt** / drop
builds.
- **[#7514](ROCm/rocm-libraries#7514 —
follow-up to wire **`projects/hipblaslt/**`** so **`sparselt`** runs
when **hipBLASLt** changes, closing that gap long-term (orthogonal to
this YAML fix, but related).

## Checklist

- [ ] Confirm no remaining **`AssertFree1DivByMT1LowbitGT1`** / stray
**KRingShift** keys in the touched trees (`git grep` on the branch).
- [ ] After merge, spot-check **TheRock** + optional
**hipblaslt-drop-build** on **`develop`**.
tony-davis and others added 2 commits May 18, 2026 15:52
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
aledudek pushed a commit that referenced this pull request May 20, 2026
…hift, AssertKRingShiftTailWrapOnly, BAddrInterleave (#7513)

## Summary

Downstream **hipSPARSELt** / **TensileCreateLibrary** builds broke after
**[#7443](#7443 (*Manual
revert KRingShift*) because **TensileLite** stopped understanding
several **Assert\*** / **KRingShift** fields that were **still present**
in checked-in **Tensile logic YAML**. This PR **removes those stale
keywords** from the affected **hipBLASLt** and **hipSPARSELt** logic
files so parsing matches the reverted **Python** behavior.

## Root cause (what broke)

- **[#7443](#7443 removed
handlers in `projects/hipblaslt/tensilelite/Tensile/Contractions.py` for
keys such as **`AssertFree1DivByMT1LowbitGT1`**,
**`AssertKRingShiftTailWrapOnly`**, and related **KRingShift** /
**BAddrInterleave** plumbing.
- Shipped **YAML** under
**`projects/hipsparselt/.../Tensile/Logic/...`** (e.g. **gfx950** /
**gfx942**) and **`projects/hipblaslt/.../Tensile/Logic/...`** (e.g.
**gfx1250** GridBased) still contained those fields.
- **TensileCreateLibrary** then hit **`RuntimeError: Unknown assertion
key: AssertFree1DivByMT1LowbitGT1`** (and would be vulnerable to the
next unknown key if only partially cleaned).

So the regression was **YAML ↔ parser skew** after the revert, not the
drop Docker image or venv.

## What this PR does

- **hipSPARSELt:** strips **`AssertFree1DivByMT1LowbitGT1`**,
**`KRingShift`**, **`AssertKRingShiftTailWrapOnly`**,
**`BAddrInterleave`** from the **Equality** logic YAMLs under
**`aquavanjaram/gfx942`** and **`gfx950`** (the paths exercised by
**gfx942;gfx950** builds).
- **hipBLASLt:** same cleanup on the **8** **`gfx1250` GridBased** logic
YAMLs that still carried those keys.

No change to kernel math here—this aligns **on-disk logic** with the
**post-#7443** **TensileLite** parser.

## How we know it’s fixed

- **Math CI:** **[hipblaslt-drop-build
#117](https://math-ci.amd.com/job/image-builder/job/hipblaslt-drop-build/117/)**
completed **SUCCESS** with a **`ROCM_LIBS_REF`** that includes this fix,
exercising **hipBLASLt** + **hipSPARSELt** with **gfx942;gfx950** (full
drop-style lane).
- **This PR’s presubmit:** touches `projects/hipsparselt/**` (and
**hipBLASLt** logic), so **TheRock** runs the **hipSPARSELt** /
**sparselt** workstream for this change—not only **blas**—which directly
exercises **TensileCreateLibrary** on the updated YAML.
- **PR tag:** includes **`[project:hipsparselt]`** so **hipSPARSELt** is
**built and tested** on this PR alongside **hipBLASLt**, not only after
merge.

## CI coverage context (why #7443 slipped)

- **[#7519](#7519 —
documents that **hipBLASLt-only** PRs previously did **not** reliably
pull **hipSPARSELt** into the **TheRock** matrix, so **#7443** could
merge green while still breaking downstream **hipSPARSELt** / drop
builds.
- **[#7514](#7514 —
follow-up to wire **`projects/hipblaslt/**`** so **`sparselt`** runs
when **hipBLASLt** changes, closing that gap long-term (orthogonal to
this YAML fix, but related).

## Checklist

- [ ] Confirm no remaining **`AssertFree1DivByMT1LowbitGT1`** / stray
**KRingShift** keys in the touched trees (`git grep` on the branch).
- [ ] After merge, spot-check **TheRock** + optional
**hipblaslt-drop-build** on **`develop`**.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Collaborator

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this needs to go in pending #7219

Comment thread .github/scripts/therock_matrix.py
Comment thread .github/scripts/tests/therock_matrix_test.py Outdated
Comment thread .github/scripts/tests/therock_matrix_test.py
Comment thread .github/scripts/tests/therock_matrix_test.py
tony-davis and others added 2 commits May 28, 2026 12:22
David flagged that collect_projects_to_run() extends lists, replaces
values, and deletes keys directly on the module-level `project_map`
and `additional_options` dictionaries. That worked for the GitHub
Action's single invocation but broke any second call in the same
process - including the unit tests, which were forced to
importlib.reload(therock_matrix) between cases to mask the leak.

Take per-call deep copies of both dicts at the top of the function and
operate on those locally. Drop the importlib.reload workaround in the
tests and add a regression test that snapshots the module dicts,
exercises representative subtree combinations, and asserts the
originals are untouched.

Co-authored-by: Cursor <cursoragent@cursor.com>
Pre-commit's black hook collapsed two short multi-line calls onto
single lines. Pure formatting; no behavior change.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@davidd-amd davidd-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to hear things are moving in the right direction.

@tony-davis tony-davis requested a review from a team May 28, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] hipBLASLt PRs did not run hipSPARSELt in TheRock matrix (under-tested downstream)

6 participants